In [1]:
from rasterstats import raster_stats
elev = '/data/projects/murdock/zonal_fcid_test/dem_aea2_feet.tif'
polys = '/data/projects/murdock/zonal_fcid_test/fcid.shp'
stats = raster_stats(polys, elev, stats="*")
len(stats)
Out[1]:
We can ask interesting questions such as "What polygon has the highest standard deviation in elevation?" by sorting the list by the std
key.
In [2]:
sorted(stats, key=lambda k: k['std'], reverse=True)[0]
Out[2]:
Let's see how this performs with a 4.8 MB raster and a 3.3MB shapefile with > 20k polygons.
In [3]:
%timeit raster_stats(polys, elev)
Compare that to other alternatives:
Much faster, plus rasterstats is running in a VM in this case!
Let's try to optimize our raster_stats call by using global_src_extent=True
. This will load the raster into memory once for the entire extent of the vector layer; less disk reads can mean better performance if raster access from disk is your limiting factor and you can fit the raster and resulting temporary arrays into memory!
In [4]:
%timeit raster_stats(polys, elev, global_src_extent=True)
No improvement at all. This indicates that the raster data source is relatively quick to read from disk. For other formats (such as jpeg or networked data sources) where the pixel values are slower to read from disk, the global_src_extent
can increase performace.